IEEE/ACM Transactions on Computational Biology and Bioinformatics — Latest Matching Preprints

1

HetNetEX: Exact Asymptotic Inference in Heterogeneous Biomedical Knowledge Graphs

Ghosh, T.; Gillenwater, L. A.; Greene, C. S.; Costello, J. C.

2026-07-10 systems biology 10.64898/2026.07.05.736581 medRxiv

Top 0.2%

4.0%

Show abstract

Heterogeneous biomedical knowledge networks (hetnets) integrate disparate data types, drugs, genes, diseases, and pathways, across independent sources; Hetionet (https://het.io) is a widely used example. A standard approach for assessing connectivity significance is XSwap, which permutes the hetnet P times and fits a gamma-hurdle null model to the degree-weighted path count (DWPC), pooling permuted values across pairs with matching source and target degrees to increase the effective sample size. This permutation approach has been highly successful in practice, but it faces four practical constraints in large graphs: (1) a finite resolution for the smallest reportable p-values, (2) computational cost that grows prohibitive at path lengths L [≥] 4 or 5, (3) a variance model (Var {propto} {micro}2) that departs from the configuration-model form (1 +{kappa} ){micro}, and (4) O(P 10m L) runtime. To complement this approach, we present HetNetEX (Heterogeneous Network EXact inference), which computes the null DWPC distribution analytically from degree sequences using the configuration model in O(Ln) time. In simulations at P = 200 across L = 1-4, HetNetEX achieves Spearman{rho} > 0.96 concordance with XSwap rankings while being >10,000x faster and providing analytical p-values without a resolution ceiling. High-degree pairs show larger XSwap sampling error than low-degree pairs, reflecting the finite-sample nature of permutation that analytical computation avoids.

2

CerViX-Net: A Multi-Branch Fusion of Vision Transformer and Convolutional Neural Networks for Cervical Cancer Detection using Cytology Images

De, S.

2026-06-24 radiology and imaging 10.64898/2026.06.24.26356425 medRxiv

Top 0.3%

3.1%

Show abstract

Cervical cancer represents a pressing global health challenge, emphasizing the critical need for accurate and timely diagnostic methods to facilitate effective treatment and improve survival rates. In response to this challenge, the study presents CerViX-Net, an innovative classification framework designed to advance cervical cancer detection through enhanced computational efficiency and diagnostic accuracy. The development of CerViX-Net is motivated by the limitations of traditional diagnostic models, particularly in handling the computational and memory demands of large-scale data, while ensuring precise feature extraction and classification. CerViX-Net employs a hybrid deep learning architecture that combines the capabilities of ResNet50, EfficientNet-B0, and a Modified Vision Transformer (ViT) module. The ResNet50 branch extracts hierarchical features through stacked convolutional and identity blocks. In another path, the modified ViT module transforms image patches via linear projection, augments them with positional and class embeddings, and processes them using Parallel Transformer Encoder layers to model contextual relationships. Concurrently, EfficientNet-B0 utilizes MBConv blocks to extract multi-scale representations. The feature outputs from all three branches are integrated and passed through a classification head consisting of dropout layers and dense layers to ensure robust and accurate predictions. The proposed framework is rigorously evaluated on the Mendeley LBC dataset, achieving exceptional performance metrics with an accuracy of 99.69%, precision of 99.28%, recall of 99.48%, and an F1-score of 99.52%. The robustness of CerViX-Net is further validated on the SIPaKMeD and Herlev Pap Smear datasets, where it demonstrates comparable excellence, underscoring its efficacy and adaptability across diverse cytology datasets. Statistical validation using Friedman's test further reinforces its superiority over competing methods.

3

Binary search and and set operations on compacted k-mer lists

Dufresne, Y.; Andreace, F.

2026-07-03 bioinformatics 10.64898/2026.06.29.735436 medRxiv

Top 0.7%

1.3%

Show abstract

Sorted lists of elements are particularly good for computing set operations. A single scan of the two lists is sufficient to materialize or count the results of the union, intersection, difference, and xor operators. In bioinformatics, only a few tools are designed to perform these operations on k-mers. A fast tool like KMC allows set operations at the cost of storing individual k-mers. In this paper, we introduce a novel way to represent sorted k-mers as a collection of recomposed super-k-mer sorted lists. We introduce the concept of virtual super-k-mer and show how to construct, query and perform set operations on sorted lists of virtual super-k-mers. In the implementation sklib, we demonstrate high throughput of the data structure for construction and set operations, while remaining competitive in query capabilities, within a controlled memory footprint (2-5x decrease in bits/element compared to KMC).

4

A foundation model enables prediction of natural product molecular properties, bioactivity, and structural similarity from biosynthetic gene cluster sequence

Walker, A.

2026-07-07 bioinformatics 10.64898/2026.07.05.736569 medRxiv

Top 0.7%

1.2%

Show abstract

Genome mining is a powerful technique in natural product discovery, where biosynthetic gene clusters that are likely to produce novel or desirable natural products are identified through bioinformatic analysis. There are many more predicted biosynthetic gene clusters than can easily be experimentally characterized. Additional computational methods to prioritize biosynthetic gene clusters by the bioactivity, structural properties, or novelty of the product would make genome mining more efficient. Multiple machine learning/artificial intelligence models have been developed to predict product properties from biosynthetic gene cluster sequence, but they are limited by small quantities of training data. Model pretraining with unlabeled data is a powerful technique to develop models that can learn on a limited amount of labeled training data. Biosynthetic gene clusters are well suited to this strategy because there are many predicted clusters with only a small percentage being characterized. This paper reports BGC-MLM, a foundation model that is pretrained with a masked language task on predicted biosynthetic gene clusters and then fine-tuned for downstream applications including prediction of product structural class, bioactivity, chemical properties, counts of functional groups, and chemical fingerprint. Comparison to a model trained without pretraining shows that pretraining generally improves performance. BGC-MLM shows better or similar performance to existing specialized methods for these tasks, demonstrating its utility as a foundation model for natural product genome mining.

5

Parameter-efficient deep learning for pneumonia detection on chest X-rays: A comparative evaluation of explainable AI methods

Mahtabi, B.; Nasr-Esfahani, E.; Yaraghi, S.

2026-07-16 radiology and imaging 10.64898/2026.07.14.26358065 medRxiv

Top 0.9%

1.0%

Show abstract

Pneumonia is a leading cause of infectious disease mortality worldwide, accounting for approximately 2.5 million deaths annually and 15% of deaths in children under five. Chest X-ray imaging remains the primary diagnostic tool, but accurate interpretation requires radiological expertise that is disproportionately concentrated in high-income settings, creating a diagnostic gap where disease burden is highest. Automated deep learning offers a scalable complement to specialist-dependent diagnosis, yet clinical adoption requires both high accuracy and transparent, interpretable reasoning. Convolutional neural networks (CNNs) have shown strong potential for pneumonia detection from chest X-rays, but two barriers impede clinical translation: the interpretability of black-box models and the computational feasibility of large architectures in resource-constrained settings. Explainable AI (XAI) methods such as Grad-CAM, Grad-CAM++, and Score-CAM address the interpretability barrier, yet systematic quantitative comparisons across multiple CNN architectures remain scarce. Furthermore, CNN architectures widely used for medical image classification carry high parameter counts that limit feasibility in resource-constrained settings, motivating architectures that achieve competitive accuracy with substantially fewer parameters. Here we propose a parameter-efficient deep learning framework for pneumonia detection based on transfer learning, evaluated across three CNN architectures representing distinct architectural families: EfficientNet-B0 with fine-tuning (proposed method), ResNet50, and DenseNet121, trained under identical conditions on the Kaggle chest X-ray dataset (5,863 images). Our method achieved 90% classification accuracy, outperforming both baselines while requiring 4.8x fewer parameters than ResNet50. To evaluate explainability, Grad-CAM, Grad-CAM++, and Score-CAM were applied across all three architectures and compared quantitatively using Intersection over Union against manually annotated lung segmentation masks, Insertion score, and Deletion score, with pairwise statistical validation via Wilcoxon signed-rank tests and Bonferroni correction. Findings show that classification accuracy and XAI explanation quality must be evaluated independently, and that the proposed parameter-efficient architecture offers a favorable trade-off for resource-constrained clinical deployment.

6

Calibrating machine learning approaches for probability estimation without calibration data

Di Carluccio, E.; Koliopanos, G.; Ojeda, F. M.; Weimar, C.; Ziegler, A.

2026-07-13 epidemiology 10.64898/2026.07.10.26357723 medRxiv

Top 1.0%

1.0%

Show abstract

Statistical prediction models for binary outcomes are becoming increasingly popular. One significant challenge is calibrating these models to suit the characteristics of a target population that is structurally different from the original population. Calibration is especially challenging when there is no training data available from the target population. To address this problem, we propose a novel calibration method, SimCal, which uses synthetic data generated from the model development data in conjunction with marginal statistics from the calibration cohort. We show that expert judgment modeling (EJM) may be used for calibration if cross-sectional data from the target population are available comprising expert judgments about the potential outcome and the covariates. We describe three alternative calibration approaches when calibration data are lacking: similarity-binning averaging (SBA), adaptive calibration of predictions (ACP), and Elkan calibration. In a simulation study, we compare SBA, ACP, Elkan calibration, and SimCal. R code for applying these methods is provided from the re-analysis of data on coronary artery disease. We illustrate all 5 calibration approaches with a real data set for predicting functional outcome after stroke and all approaches but EJM in the re-analysis of the Cleveland Clinic data. None of the approaches performed convincingly well in all situations. SimCal performed well when model parameters were correctly specified. EJM failed on the stroke data. Further research is urgently required for calibration in the absence of calibration data.

7

Towards a Unified Exact Solution of Rearrangement Small Parsimony for Natural Genomes

Bohnenkaemper, L.; Frolova, D.

2026-06-28 bioinformatics 10.64898/2026.06.23.733974 medRxiv

Top 1.0%

0.9%

Show abstract

Phylogenetic reconstruction is a fundamental problem in comparative genomics. As a theoretical problem in rearrangement studies, this has been modelled as the Small Parsimony Problem (SPP), in which ancestral genome structures have to be determined minimizing the number of rearrangement events occurring throughout the phylogeny. This problem is of significant interest in microbial and cancer genomics, due to the prevalence and clinical importance of rearrangement events. Genome structures in this problem are expressed as sequences of markers, which are themselves oriented sequence features (such as genes) that abstract from non-structural variations. Recent research has focused on the problem under the natural genomes model, in which arbitrary variations in copy number of markers are allowed. Natural genomes are often studied under the DCJ-indel model, a model which has already been successfully applied to plasmid data. There also exist ILP solutions to a variant of the Small Parsimony Problem under the DCJ-indel model. However, these solutions are limited in their applicability, as they make some critical simplifications for tractability purposes: ancestral marker frequencies and precomputed putative ancestral adjancencies, with their predicted likelihoods, are assumed as input. This creates multiple problems from both a theoretical and practical perspective. Firstly, this simplification means that not the full state space is searched for a solution, but rather only the subset of genomes with the precomputed putative adjacencies, meaning an optimal solution to the exact SPP is not guaranteed. Secondly, marker frequencies are given externally, without any theoretical guarantees. Thirdly, the method used to precompute adjacencies relies on gene trees, which requires the use of genes as markers, when gene annotation is often unreliable, especially in regions with a lot of rearrangement. Additionally, this restricts the applicability of the approach to sets of genomes that are both divergent and large enough to be able to produce informative gene trees. This is, for example, rarely the case for plasmids, where nucleotide mutations are rarer than rearrangements and genomes are small. Hence, we revisit the problem to solve the exact SPP by introducing a cost to indel operations, which allows us to compute ranges of marker frequencies and derive theoretical results, that allow us to reduce the solution space that the ILP searches without sacrificing optimality. We show that this makes the problem tractable for the case of small and recently related genomes, first on simulated genomes, and then on a set of pathogenic plasmids which represent a realistic use case for the method.

8

BOSE: A Bayesian Order Statistics-Based Estimator for Recovering the Sample Mean and Standard Deviation

Pan, W.; Lu, Z.; Jiang, W.; Lim, J.; Xu, L.; Wang, X.

2026-07-01 bioinformatics 10.64898/2026.06.26.734829 medRxiv

Top 1%

0.9%

Show abstract

In meta-analyses of continuous outcomes, the sample mean and standard deviation (SD) are essential for synthesizing effect sizes across studies. However, clinical studies frequently report alternative summary statistics, such as the median, quartiles, and range. To enable inclusion of such studies, various methods have been proposed to estimate the sample mean and SD from these reported summaries. We propose the Bayesian Order Statistics-based Estimator (BOSE), which leverages the joint likelihood of observed order statistics together with weakly informative priors to obtain the full posterior distribution for the mean and SD without relying on computationally intensive iterative procedures such as Markov chain Monte Carlo algorithms. Our numerical studies demonstrate that BOSE performs competitively with existing approaches in estimating the mean, while achieving superior performance for estimating the SD across all evaluated scenarios, particularly in small-sample settings. Under non-normal distributions including skewed, heavy-tailed, and bimodal settings with mild or moderate deviations from normality, BOSE remains robust and stable, whereas methods specifically designed for skewed distributions may become unstable or even inapplicable. Beyond point estimation, BOSE naturally provides empirically validated posterior credible intervals, enabling researchers to formally quantify uncertainty for study-level estimates and make reliable, evidence-based decisions in meta-analytic research synthesis. A publicly accessible web application implementing BOSE and competing methods is also provided to facilitate practical use in meta-analytic research.

9

An axiomatic approach to cultivar ranking in multi-environment trials

Kondratev, A. Y.; Ianovski, E.; Voronina, E.; Crossa, J.

2026-07-01 genetics 10.64898/2026.06.27.734959 medRxiv

Top 1%

0.9%

Show abstract

Multi-environment trials are central to cultivar evaluation because they reveal how candidate cultivars perform across locations, years, management conditions, and stress environments. The resulting yield matrix is a rich source of data on genotype-by-environment interaction, and a wide literature on estimation, decomposition, visualisation, and prediction of yield potential and stability has flourished. However the ultimate question of which cultivar to recommend on the basis of such a matrix is often left implicit. The question is far from trivial, and in this paper we formulate cultivar recommendation as an axiomatic ranking problem. This framework is rich enough to encompass the existing literature on stability indices, as well as any other deterministic ranking procedure. We show that many commonly used stability-based procedures can violate minimal criteria of efficiency or consistency. The result of such violations is that a cultivar with uniformly high yield could be ranked below a cultivar with uniformly low yield, or the relative ranks of two cultivars could depend on whether or not a third cultivar is present in the matrix. Our results prove that under a small number of such criteria the space of admissible rules collapses to the family of power means and their limiting cases. If we further wish to allow multiplication normalisation of yield, we are left with the geometric mean as the unique solution.

10

ProtAug: An Empirical Investigation of pLM-Guided Data Augmentation for Protein Sequence Prediction Tasks

Chen, Z.; Wang, R.; Luo, Q.

2026-07-11 bioinformatics 10.64898/2026.07.10.737545 medRxiv

Top 1%

0.8%

Show abstract

Protein language models (pLMs) offer great potential for protein sequence analysis, yet the scarcity of labeled data often limits their effectiveness in fine-tuning. Data augmentation is a promising remedy, but systematic evaluation of augmentation strategies for protein sequences remains limited, and the conditions under which augmentation confers downstream benefits are not well understood. In this paper, we systematically investigate pLM-guided substitution-based augmentation across seven protein prediction tasks. We propose ProtAug, a framework that leverages encoder-based (ESM-2) and autoregressive (ProtGPT2) pLMs to generate augmented sequences with user-controlled variation levels. Our investigation focuses on four questions: (Q1) whether pLM-synthesized sequences preserve more original signals than simpler methods, (Q2) to what extent augmentation improves prediction performance, (Q3) how variation levels affect downstream accuracy across tasks and models, and (Q4) whether biological plausibility is a necessary condition for achieving improvement. Our experimental results show that: (1) ProtAug Esm generally preserves motifs and structural similarity better than simple substitution, often comparable to homology retrieval; (2) augmentation yields consistent but task-dependent improvements, with ProtAug Esm achieving the best or second-best performance in 5 out of 7 tasks at 10% variation; (3) low-to-moderate variation levels (2-30%) perform best overall, although high-variation augmentation can benefit certain structure-related tasks; (4) the necessity of biological plausibility is task- and variation-dependent--while semantic preservation correlates with performance at low-to-moderate variation levels, improved generalization at high variation levels suggests that regularization effects, rather than label preservation, can also drive performance gains.

11

GR-SAFS: A Graph-Regularized Stacking Framework with Adaptive Feature Selection for High-Dimensional Prognostic Biomarker Discovery

He, J.; Guan, J.

2026-06-28 bioinformatics 10.64898/2026.06.23.733986 medRxiv

Top 1%

0.8%

Show abstract

Identifying prognostic biomarkers from high-dimensional transcriptomic data poses a triple challenge: achieving sparsity, preserving biological network topology, and integrating complementary nonlinear signals. Existing methods typically ignore network structure, miss nonlinear interactions, or lack a principled mechanism to fuse heterogeneous model outputs. We introduce GR-SAFS (Graph-Regularized Stacking with Adaptive Feature Selection), a framework with three modules: a Graph-Lasso engine embedding gene co-expression network Laplacian priors, run in parallel with a Random Forest engine; an empirical cumulative distribution function (eCDF) alignment layer that places sparse and dense importances on a common percentile scale; and a diversity-penalized quadratic programming router whose strict convexity yields a unique global optimum. On the TCGA-LUAD cohort, GR-SAFS identifies a 20-gene signature with a training concordance index of 0.700. Across two independent crossplatform microarray cohorts, GR-SAFS is the only method whose frozen signature retains statistically significant risk stratification in every cohort, where stronger-C-index baselines lose significance on at least one external cohort. Functional enrichment anchors the signature to a coherent Wnt/{beta};-catenin axis. An open-source implementation is released for full reproducibility.

12

GeneBench-Pro: Evaluating Multistage Statistical Reasoning\\in Genomics, Quantitative Biology, and Translational Biomedicine

Li, J. H.; Ho, A. J.

2026-06-30 bioinformatics 10.64898/2026.06.29.735386 medRxiv

Top 1%

0.8%

Show abstract

We introduce GeneBench-Pro, an expanded and improved version of GeneBench that comprises harder problems across a wider breadth of domains. GeneBench-Pro is a benchmark for AI agents performing realistic multi-stage scientific analyses in genomics, quantitative biology, and translational biomedicine which seeks to capture the complexity of real-world problems that computational life scientists face when tasked with producing a conclusion upon which a downstream scientific or translational decision is contingent. The benchmark comprises 129 evaluations targeting quantities of direct practical relevance across 10 primary domains and 21 terminal subdomains, with a genomics-centered core. Similarly to GeneBench, each problem provides the agent with brief context, a target estimand, and minimal guidance otherwise; the agent must then navigate multiple dependent decision points; i.e., substantive inferential forks where a plausible wrong choice changes the downstream analysis, to identify and execute the correct analysis workflow and arrive at the correct answer. Relative to GeneBench, GeneBench-Pro adds 29 new problems, drops three, and introduces significantly redesigned versions of 54 of the remaining 100 overlapping problems. 82 of the 129 problems were reviewed by external domain experts, whose findings led to prompt/data modifications and redesign of those problems whose targets were not sufficiently identifiable. Ten externally reviewed problems are released publicly, 50 held-out problems were provided to Artificial Analysis for independent third-party model benchmarking, and the remainder are retained as an internal holdout. In evaluations over the full 129-problem suite, GPT-5.6 Sol reaches an eval-level pass rate of 28.7% at the max reasoning level, and GPT-5.6 Sol Pro reaches 31.5% in separately reported GPT Pro runs. GPT-5.5 reaches 12.0%, GPT-5.4 reaches 8.9%, and the strongest non-GPT baseline, Claude Opus 4.8, reaches 16.0%. As with GeneBench, models often complete substantial portions of the workflow but exhibit a consistent gap between noticing and acting by identifying local diagnostic signals but failing to propagate the implications to the corresponding analysis decision. As a result, models often select wrong estimators or persist on initially plausible but incorrect analysis paths. GeneBench-Pro therefore measures an emerging capability of long-horizon biological reasoning that remains unreliable.

13

The Variance-Stabilizing Transformation for the Poisson Rate Ratio: Closed-Form Confidence Intervals

Ng, S.-P.

2026-07-18 epidemiology 10.64898/2026.07.16.26358255 medRxiv

Top 1%

0.8%

Show abstract

The incidence rate ratio R is the standard measure for comparing event rates in clinical trials and epidemiology. In vaccine trials, the vaccine efficacy is VE = 1 - R. When events are rare, the two arm counts are Poisson. The estimator of R is heteroskedastic: its sampling variance changes with the data. So no fixed-width interval covers correctly everywhere. The usual log-Wald interval is undefined at zero events and covers poorly at small counts. Early vaccine and drug-safety readouts fall in exactly this regime. We show that a single reparameterization collapses this bivariate problem to an effective one-parameter family with a quadratic variance function, whose variance-stabilizing transformation is 2 arcsinh(sqrt(R)). The reduction yields a closed-form confidence interval for R. Its two leading errors, a curvature bias and the variability of the estimated scale, each admit a closed-form correction with no tuning constants. In a Monte Carlo study of our seven arcsinh variants and five competitors, the +Curve+Stu variant covers within 0.002 of the nominal 0.95 for about 50 control and 5 treatment events. Its width is on par with the best competitor. It avoids the conservatism and zero-count breakdown of log-Wald and MOVER. For moderate counts, we recommend this interval; for sparser data, our Bar-Lev and Enis count-shift variant is more robust. The result is a ready-to-use, closed-form interval for the low-count regime. We illustrate it on early Covid-19 vaccine-efficacy readouts and provide reference implementations in R and Python.

14

Recalibrating Mendelian randomization under winner's curse, sample structure and polygenicity

Yang, Y.; Lin, Z.; Xue, H.; Zhu, X.

2026-07-07 genetic and genomic medicine 10.64898/2026.06.25.26356593 medRxiv

Top 1%

0.8%

Show abstract

Recently, Hu et al. (2024) conducted a benchmarking study showing that most existing Mendelian randomization (MR) methods exhibit substantial bias and inflated type-I error rates in real data. They attributed these failures to two largely neglected sources of bias: winner's curse and polygenicity-induced bias. Although a few methods have been developed to address one or both of these issues, existing approaches either do not fully account for both biases or are restricted to the univariable setting. In this paper, we propose a multivariable Rao-Blackwellization that corrects winner's curse while accounting for polygenicity and sample structure in a unified framework. Unlike univariable Rao-Blackwellization, where instrument selection yields a truncated normal statistic amenable to a Mills-ratio correction, multivariable Rao-Blackwellization conditions on a noncentral $\chi^2$ statistic, for which no analogous correction is available. We derive closed-form conditional moments under this instrument selection model and use them to construct bias-corrected summary statistics that can be integrated into a wide range of existing MR methods. Simulations and real data analyses show that, when combined with methods such as MR-cML and MR-BEE, the proposed correction substantially improves type-I error control and yields more robust inference.

15

A retrospective study of a Chinese vision-language large model for emergency 3D brain CT interpretation

Chen, Y.; Zheng, J.; Wang, Y.; Wu, B.; Li, L.; Liu, M.; Xu, L.; Wu, Y.; Liu, C.; Guo, L.; Yang, H.; Bai, X.; Qin, F.; Liao, Q.; Gu, Y.; Zhao, G.; Ma, L.; Pan, K.; Guo, J.; Zhou, Y.; Sun, H.; Tian, Q.

2026-07-13 health informatics 10.64898/2026.07.11.26357421 medRxiv

Top 1%

0.8%

Show abstract

Emergency brain computed tomography (CT) is the first line imaging modality for patients with acute neurological symptoms and trauma, where delayed or incomplete recognition of critical findings can directly compromise clinical outcomes. However, emergency CT interpretation and Chinese reporting remain highly variable under severe time constraints and heterogeneous institutional settings. In this study, we develop ERBrain, a multimodal large model specifically tailored for emergency brain CT, which jointly performs three-dimensional image understanding, Chinese radiology report generation, and emergency severity triage within a unified framework. ERBrain integrates volumetric visual representations with a Chinese large language model and explicitly prioritizes emergency critical signs through risk focused training objectives and a lightweight knowledge-augmented prompting strategy. Using more than 10,000 multicentre emergency CT studies, ERBrain achieved an accuracy of 0.943 and a balanced accuracy of 0.940 for three-level emergency triage and achieved the highest FIES-Avg clinical semantic score among the evaluated report-generation models in the in-distribution cohort. Across external data, ERBrain maintained favourable triage performance in two cross-institutional validation cohorts, whereas performance was lower but remained clinically informative in a third cohort characterized by an extremely low prevalence of Positive cases. These findings support further prospective evaluation of ERBrain as a radiology worklist prioritization and report-drafting assistant in heterogeneous emergency imaging settings.

16

Neural Processes with Normalizing Flows for Wheat Height Estimation

Boss, M.;Volpi, M.;Roth, L.

2026-07-09 Plant Biology 10.64898/2026.06.24.734247 medRxiv

Top 1%

0.7%

Show abstract

In this work, we investigate modeling plant traits over time using neural processes, a class of machine learning models that learn distributions over functions. Plant growth is an inherently stochastic process with complex dynamics measured mostly at irregular times throughout the growing seasons. While individual trait trajectories may be simple, their distributions are shaped by complex interactions between genotype, environment, and other factors. In particular, we focus on plant height in wheat, a deceptively simple-looking trait with complex dynamics. To model these trajectory distributions, we evaluate neural processes and in particular extensions using normalizing flows, with different combinations of genotype and environmental covariates. For controlled evaluations, we generate synthetic wheat height trajectories calibrated against Swiss weather station records and the FIP1 dataset. To fully evaluate these trajectory distributions, we use signatures, vector representations of sequential data, together with Sig-MMD and the recently introduced CSig-MMD. Sig-MMD enables direct pathwise comparison of predicted and simulator trajectory distributions, while CSig-MMD focuses this comparison on the tail, including lodged trajectories. Together, these metrics allow us to assess whether the models capture the full distribution of growth trajectories, including rare outcomes.

17

Integrating Genetic, Environmental, Cognitive, and Temperament Data for ADHD Prediction in Explainable Deep Learning Models

Barnett, E. J.; Mooney, M. A.; Zhang-James, Y.; Ryabinin, P.; Faraone, S. V.

2026-07-01 genetic and genomic medicine 10.64898/2026.06.29.26356796 medRxiv

Top 1%

0.6%

Show abstract

Objective: Attention-deficit/hyperactivity disorder (ADHD) is clinically and etiologically heterogeneous, and diagnostic decisions may benefit from integrating multiple sources of information. We developed an explainable deep learning approach to test whether genetic, environmental, cognitive, demographic, and temperament data could classify ADHD diagnosis and identify features contributing to model decisions. Method: We analyzed participants from the Oregon ADHD-1000 cohort split into training, validation, and test subsets. We trained modular neural network models classifying ADHD case-control status using SNP-level genotype data with biological annotations, polygenic scores, demographics, parenting and family conflict, stress and trauma, geocoded measures, cognitive task measures, temperament factor scores, and missingness indicators. Hyperparameter optimization selected model architecture and feature block inclusion. We evaluated model performance using AUC, precision-recall curves, calibration analyses, prediction certainty analyses, and decision curve analysis. We used integrated gradients to quantify block-level, feature-level, and individualized feature importance. Results: The best model using temperament features had an AUC of 0.97 in the held-out test subset, with high accuracy, sensitivity, and specificity and a Brier score of 0.06. The best model excluding temperament had an AUC of 0.75. Feature importance analyses highlighted temperament, demographic, and cognitive domains in the temperament-inclusive model. Individualized explanations showed that prediction drivers varied across participants and could help reveal conflicting or supporting evidence across domains. Conclusion: Explainable, multi-modal classification models can integrate heterogeneous ADHD-relevant information and identify features that contribute to individual predictions. These types of models may advance ADHD risk modeling research and clinician-led decision support, especially in complex or diagnostically uncertain cases.

18

A multimodal foundation model for emergency head CT interpretation

Zheng, J.; Chen, Y.; Wu, B.; Wang, Y.; Liu, M.; Li, L.; Jiang, S.; Chen, W.; Xu, L.; Wu, Y.; Liu, C.; Guo, L.; Bai, X.; Li, Z.; Yang, H.; Qin, F.; Liu, J.; Qu, H.; Liao, Q.; Zhao, G.; Pan, K.; Guo, J.; Chen, L.; Zhou, Y.; Sun, H.; Tian, Q.

2026-07-09 health informatics 10.64898/2026.07.07.26357429 medRxiv

Top 1%

0.6%

Show abstract

Non-contrast head CT is the first-line imaging modality for acute neurological emergencies, with demand rising worldwide. However, existing foundation models for head CT interpretation are ill-suited for emergency use because they target general or chronic-disease assessment and optimize reports for lexical overlap rather than the risk-relevant findings central to emergency triage. Here we present CHIEF, a Chinese-language Head CT Interpretation Emergency Foundation model, pretrained on emergency head CT volumes and paired reports with contrastive, generative, and geometry-regularization objectives. Trained and evaluated on 16,563 examinations from seven hospitals, CHIEF achieved an AUROC of 0.9646 for emergency triage and drafted triage-oriented radiology reports, while also supporting image-to-text retrieval for reference-case support and zero-shot abnormality recognition. CHIEF generated reports of substantially higher quality than those from commercial multimodal large language models, which could not be reliably distinguished from human-written ones by radiologists in a blinded Turing test. Overall, CHIEF provides a generalizable foundation for emergency head CT interpretation and radiologist-in-the-loop clinical decision support.

19

Overinflation and overconcentration: why Cauchy perturbation kernels are the right choice for ABC-SMC

Sturrock, M.; Shahrezaei, V.

2026-07-09 systems biology 10.64898/2026.06.24.734205 medRxiv

Top 1%

0.6%

Show abstract

Approximate Bayesian computation sequential Monte Carlo (ABC-SMC) propagates its particles with a perturbation kernel, and with the standard Normal kernel it degrades sharply as the parameter dimension grows, a failure usually attributed to dimension itself. We show instead that it is governed by the quality of the summary statistics, with dimension entering only through a separate and milder mechanism, and that the two must act together for the Normal kernel to break. The first ingredient is covariance overinflation: the kernel covariance, estimated from the particle cloud, overshoots the true posterior covariance by a factor set by information loss in the summary statistics. We derive this overscaling factor in closed form for a Gaussian model with sufficient statistics and show that it stays modest at any dimension, shrinking toward its baseline value as the tolerance tightens; the extreme values seen in practice (of order 103) are a signature of insufficient summaries, not of dimension. The second ingredient is perturbation overconcentration: the normalised Normal step size concentrates around one as the dimension grows, so every proposal overshoots by the same factor. Either ingredient alone is harmless; only their combination breaks the Normal kernel. A Cauchy kernel (multivariate t with one degree of freedom) removes the concentration, keeping a positive acceptance rate under arbitrary overscaling at a bounded worst-case cost of 1.87x in expected squared jump distance. In a Metropolis-Hastings framework we derive closed-form acceptance rates for both kernels that illustrate the advantage of the Cauchy kernel in this limit. A series of full ABC-SMC computational experiments on five problems at d = 12, including a hierarchical gene-expression model, show the Cauchy reducing the sliced Wasserstein distance to the reference posterior by factors of up to 50 with the same simulation budget. Since the summary statistics are commonly insufficient for the models that require ABC, overinflation is structural and the Cauchy perturbation kernel is the right default for problems in higher dimensions.

20

Analytical perturbation reveals hidden instability of biological phenotypes

Piorkowska, N. J.; Ostromecki, A.; Franik, G.; Bizon, A.

2026-07-16 endocrinology 10.64898/2026.07.13.26357916 medRxiv

Top 1%

0.6%

Show abstract

Background Unsupervised machine learning has become a cornerstone of computational phenotyping across clinical medicine, genomics, imaging, and multi-omics research. However, phenotype discovery relies on a sequence of analytical decisions - including missing-data handling, preprocessing, dimensionality reduction, clustering methodology, and stochastic initialization - that are rarely evaluated collectively. Although clustering stability has been extensively investigated, the robustness of complete analytical workflows remains largely unexplored. Results We developed an Analytical Perturbation Framework that systematically quantifies the robustness of phenotype discovery by perturbing complete unsupervised learning workflows rather than individual clustering algorithms. Using a real-world cohort of 1,286 women with polycystic ovary syndrome (PCOS), we generated 116 valid analytical pipelines comprising alternative preprocessing strategies, missing-data handling methods, dimensionality reduction approaches, clustering algorithms, and random initializations. Agreement between independently generated phenotype solutions was consistently low (median Adjusted Rand Index = 0.079), indicating substantial sensitivity of phenotype discovery to routine analytical decisions. Variance decomposition identified preprocessing as the largest contributor to phenotype instability (22.8%), followed by clustering methodology (14.6%), whereas stochastic initialization explained only 3.1% of the observed variability. At the patient level, most individuals exhibited reproducible phenotype assignments (median Patient Robustness Score = 0.719), although a substantial subgroup showed markedly lower assignment stability. Feature perturbation analyses identified follicle-stimulating hormone, anti-thyroglobulin antibodies, anti-thyroid peroxidase antibodies, total testosterone, luteinizing hormone, and androstenedione as the strongest contributors to computational robustness, rather than biological importance. Finally, phenotype solutions demonstrating greater computational robustness also exhibited greater biological coherence during independent validation.